What do we learn from correlations of local and global network properties? 



O 

o 

(N 



Magnus Jungsbluth, Bcrnd Burghardt and Alexander K. Hartmann 
Institut fur Theoretische Physik, Universitdt Gottingen, 
Friedrich-Hund Platz 1, D-37077 Gottingen, Germany 
(Dated: February 2, 2008) 

In complex networks a common task is to identify the most important or "central" nodes. There 
are several definitions, often called centrality measures, which often lead to different results. Here 
we study extensively correlations between four local and global measures namely the degree, the 
shortest-path-betweenness, the random-walk betweenness and the subgraph centrality on different 
random-network models like Erdos-Renyi, Small- World and Barabasi- Albert as well as on different 
real networks like metabolic pathways, social collaborations and computer networks. Correlations 
are quite different between the real networks and the model networks questioning whether the models 
really reflect all important properties of the real world. 
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I. INTRODUCTION 

Theories for complex networks have attracted much 
attention in the last few years. Started by the social 
sciences the research incorporates disciplines ranging 
from the social sciences over biology to physics. There 
are studies for example on analytical properties of certain 
network models, on attack vulnerability of real networks 
01 or in the prediction of epidemics p|, just to name a 
few. Extensive reviews are given in Refs. |j, |a. 

The main focus has been on the so-called scale-free 
networks that have a degree distribution P{k) (the prob- 
ability for a node to have k edges to other nodes) obey- 
ing a power-law. The exponent 7 of these power-laws 
P{k) oc is typically close to 3 for many real networks. 
There have been some attempts to explain this behav- 
ior based on the seminal work of Barabasi and Albert 
0, 0] who explain the scaling behavior by a preferential 
attachment mechanism during network growth. These 
models are based on the notion that the "importance" 
(often called centrality) of a node in some cases is given 
by the number of connections of a node. Nevertheless, 
it seems clear not all properties of a complex real-world 
system can be explained by models based on this inge- 
nious yet simple mechanism. Since the degree is a very 
local measure on a network it is not necessarily the best 
choice to characterize all types of networks. Over the 
years a few other measures H, 0, ^3 for the importance 
of nodes have been proposed that actually measure global 
properties of the whole structure. In these publications, 
examples are shown where some nodes in a network have 
a small degree, yet they play an important role for the 
network. Hence, these more globale measures may de- 
serve more attention. Nevertheless, we are not aware 
of a thorough comparison of these measures on differ- 
ent model and real- world networks. Due to the relatively 
small number of studies on these more complex measures, 
it is so far unclear wether they are indeed better suited 
to identify important nodes in networks. 

For a given network, the different measures may be 
strongly correlated, i.e. a node, which has a high impor- 
tance found when measuring using one measure, appears 
also important when using another measure, and vice- 



versa. If this was generally true for all networks, then it 
would be sufficient to study just one measure. E.g., if all 
measures were strictly monotonic and simple functions of 
the degree, then the degree would be indeed the key quan- 
tity to study. If, on the other hand, different measures 
are not strictly correlated, then nodes, which yield a high 
importance even for different measures, can be regarded 
indeed as key nodes for a given network. Also it might 
be that there arc nodes which obtain a high value for 
one measure, but not for another measure. In this case 
either one measure is not suitable for the description of a 
given network, or, if this is not systematically true, these 
nodes have to be studied more closely, to understand a 
network's behavior. In any case, it appears that studying 
the correlations of different local and global properties of 
nodes is a promising way to understand networks much 
better than just to look at the distributions of single, 
maybe even solely local properties. For this paper, we 
have systematically studied several local and global net- 
work measures for different types of network models and 
for a couple of networks describing real- world data. As in 
some previous studies, we find that the distributions of 
single measures show in most cases the well-known scale- 
free behavior, if the network shows scale-free behavior in 
the degree-distribution. Nevertheless, the standard net- 
work models are not capable to reproduce in many cases 
the complicated correlation signatures we find here in the 
real-world data. Hence, we propose that the systematic 
study of these correlations as a much better tool to study 
networks and a comparison of these correlations should 
be a suitable criterion to evaluate the validity of network 
models. 



This paper is organized as follows: In section ^] we 
introduce the centrality measures we have used in our 
studies. Section [llll gives an overview of the random net- 
works we have considered. In section Hvl we present the 
real networks that we have studied and how they have 
been constructed. In section we show our results and 
in section IVTl we give an outlook to possible future direc- 
tions of research. 
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II. CENTRALITY MEASURES 



In mathematical language a network (also often called 
graph) is a pair G — {V, E) consisting of a discrete set of 
nodes (also called vertices) V and a discrete set of edges 
E (Z V X V . We are only interested in undirected net- 
works and therefore an edge e = {i, j} is a 2-set of nodes 
containing the two nodes i,j connected by the edge. A 
component of a network is a subset of nodes with the 
following properties: Each node is reachable from each 
other node via some path (i.e. a directly connected se- 
quence) of edges and it is impossible to add another node 
from V without breaking the first requirement. In that 
sense it is a maximal subset. A network may consist of 
more than one component but we are mainly interested 
in those networks that consist of one component. Net- 
works with more that one component can be decomposed 
into a set of smaller one-component networks. In the fol- 
lowing n = |y| denotes the number of nodes and m — \E\ 
denotes the number of edges. We assume that there is 
an arbitrary but fixed order on the set of nodes so you 
can enumerate them. Each node has therefore a natural 
index. 

The most prominent centrality measure of a network 
is the so called degree, which is the number of edges in- 
cident to a node, i.e. it's number of neighbors. It can 
be calculated in 0(1) if an appropriate network repre- 
sentation is used. The degree has been used very of- 
ten to describe the importance of a node. For example 
for computer networks, where the computers are repre- 
sented by nodes and the physical network connections 
by links, routers and servers, which play a central role in 
these systems, are connected to many other computers. 
Hence, networks are often characterized by their degree- 
distribution. The class of scale- free networks, that is net- 
works with a power-law distribution, has been in the fo- 
cus of interest because many real-world networks reveal 
a scale-free degree distribution. 

On the other hand, the degree is just a local measure 
of the centrality of a node. For example in a motor-way 
network, where the nodes represent junctions and the 
edges represent routes, there can be very important junc- 
tions, which only connect few routes, but a breakdown of 
one of these junctions leads to a major traffic congestion. 
Hence, other measures have been introduced, which are 
intended to reflect to global importance of the nodes for 
a network. 

A measure of centrality that takes advantage of the 
global structure of a network is the shortest-path between- 
ness or simply betweenness of a node i, which is defined 
as the fraction of shortest paths between all possible pairs 
of nodes of the network that pass through node i. Let 

( st) 

gi be the number of shortest-paths between node s and 
t running through node i and n*-^*-* the total number of 
shortest-paths between s and t. Then the betweenness bi 
for node i is given by 
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s<t 



n{n — 1) 



The normalization n{n —\)/2 — X]s<t 1 ensures that the 
value of the betweenness is between zero and one. This 
measure has been introduced in social sciences (see Q 
and ^^) quite a while ago. The algorithm we use to 
calculate the betweenness is presented in Ref. and has 
a time-complexity of 0{mn). That means this algorithm 
can handle rather large networks really efficiently. 

The logical background of the betweenness is that the 
flow of information, goods, etc., depending on the type 
of network, can be in some way directed in a determinis- 
tic way. In particular the full network structure must be 
known for each decision. Nevertheless, e.g. if all people 
decide to take the same single shortest route to the center 
of a city, this might result in a large value of the over- 
all travelling times. Also, there may be networks, e.g. 
social networks, the nodes representing persons and the 
edges representing personal relations, where the informa- 
tion flow is not controlled externally or deterministically 
and the full network structure is not known to all players. 
A recent proposal for the so called random-walk between- 
ness (RDW betweenness) by Newman 9] models the fact 
that individual nodes do not "know" the whole structure 
of the network and therefore a global optimum assump- 
tion is not very convincing. Within this approach, ran- 
dom walks through the network are used as a basis for 
calculating the centrality for each node: The random- 
walk betweenness of a node j is the fraction of random 
walks between node s and node t passing through j av- 
eraged over all possible pairs of source node s and target 
node t. Loops within the random walks are excluded by 
using probability-flows for calculating the actual RDW 
betweenness. After a simple calculation (9| one arrives at 
an algorithm, which looks like as follows: 



1. Construct the adjacency matrix A and the de 
matrix D 



igree 



A, 



1, iff edge {i, j} exists 
0, else 

kj — J2i' ^i'j-i ^ — j 

0, else 



2. Calculate the matrix D - A 

3. Remove the last row and column, so the matrix 
becomes invertible (any equation is redundant to 
the remaining ones) 

4. Invert the matrix, add a row and a column consist- 
ing of zeros and call the resulting matrix T 

Note that so far the calculated quantities do not de- 
pend on i or s, t. Now the random-walk betweenness bi 
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for node i can be calculated by 
where 



n{n — 1) 
^i* ~ 9 5Z ^^-J l"^*'' ~ ~ "^J" ^ 



if i 7^ s and i ^ t and /** equal to one if i is equal 
to s or i. Note that although the RDW betweenness 
is based on a random quantity, its calculation is not at 
all random. Hence, any scatter observed in the data is 
due to the networks structure not due to fluctuations of 
the measurement. It is possible to implement the cal- 
culation of the RDW betweenness with time-complexity 
0{{m + n)n^). The drawback is the considerable amount 
of computer memory needed since this algorithm uses 
a adjacency matrix and other matrices of the same di- 
mension. Hence the memory consumption has the order 
O(n^). Sparse-matrix methods could make the situation 
better since most networks have sparse adjacency matri- 
ces but that would worsen the time-complexity which is 
not desirable. 

The fourth measure we use within this study is the sub- 
graph centrality (SC), which is based on the idea that 
the importance of a node should depend on its partici- 
pation in local closed walks where the contribution gets 
the smaller the longer the closed walk is. The number of 
closed walks of length k starting and ending on node i in 
the network is given by the local spectral moments fik (i) 
of the networks adjacency matrix A which are defined as 

The definition of the SC for node i is then given by 



Csi^) 



E 

k=Q 



kl 



Albeit it is possible to directly calculate the series directly 
it would not be overly efficient to do so. It is shown 
in 01 that it is possible to alternatively calculate the 
adjacency matrix's eigenvalues and an orthonormal 
base of eigenvectors Vi for a network. Then the subgraph 
centrality Cs for node i can then be calculated via 



This measure generally generates values with high order 
of magnitude and is not in some way limited. We tried to 
normalize with Cs{l) of a fully connected graph with the 
same number of vertices (all vertices are equal so every 
vertex has the same subgraph centrality), but this gave us 
values beyond machine precision for graphs larger than 
5000 vertices, i.e. even much larger than the values we 
observed for the networks under consideration. Hence, 
we used the non-normalized values. 



III. RANDOM-NETWORK MODELS 

We compared the different measures on different 
random-network models, namely the Erdos-Renyi (ER) 
model 111 111 m, the Small- World (SW) model [Hill 
llil and the Barabasi- Albert (BA) model The ER 

model consists of random networks of a fixed number of 
nodes n and for each pair of nodes an edge is added with 
probability p. The degree distribution of this model is 
Poissonian. 

The SW model is also characterized by a fixed number 
of nodes n, but here the nodes a placed on a regular grid. 
An instance is generated in two steps. First, each node 
is connected to its k nearest neighbors. In the second 
step, each edge is reconnected to one random node with 
probability p (i.e. the other node remains). Most SW net- 
works studied are based on a one-dimensional grid with 
periodic boundary conditions, i.e. the nodes are ordered 
on a circle. The degree distributions of these networks 
interpolates between a delta peak at k for p — and the 
Poissonian distribution for p — > 1 . 

The BA model is the only growth model studied here. 
In this case the networks are created by a so called pref- 
erential attachment mechanism. Each generated random 
network starts with m nodes and new nodes are added 
consecutively, one after the other. A new node is imme- 
diately connected to exactly m of the already existing 
nodes, which are chosen randomly. The higher the de- 
gree k of an existing node the bigger is the chance that it 
is selected as neighbor. Hence, the probability for a node 
i to get selected is given by its degree ki divided by the 
sum kj of all degrees of all currently existing nodes 
of the network. To efficiently generate these networks we 
used a list, where each node i is contained fc^-times. For 
each newly added node we select m different elements 
randomly from the list and connected them to the new 
node. The resulting degree distribution follows a power- 
law with exponent 7 = 8 in the limit of large degrees 
(in the tail of the distribution) . 

It is also possible to get different exponents in the tail 
by adding a certain offset fco to the probability of selecting 
a certain vertex, so the total probability goes as k + ko. 
This yields an exponent of j — 3 + ko/m ^ in the tail 
of the distribution, fco may be explicitly negative as long 
as it is — m < fco < 00. 

For all random-networks we prohibited parallel edges 
between two nodes and self-loops, i.e. for the BA model, 
each node i can be selected from the list only once. Ad- 
ditionally we extracted the largest component for the ER 
networks and the SW networks. Note that the BA model 
generates fully connected networks. 



IV. REAL NETWORKS 

It is well known that the models presented in the last 
section are able to reproduce some of the characteristics 
of real-world networks. The most realistic models for 
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many applications are the BA model and related mod- 
els based on growth mechanisms, which reproduce the 
power-law behavior of the distributions of the degree and 
some other centrality measures As indicated above, 

we propose in this paper to go beyond measuring dis- 
tributions of local or global properties, by considering 
correlations between different measures. Hence, to inves- 
tigate whether these most common models are also able 
to reproduce these complex characteristics of real-world 
networks, we have to compare with the results of at least 
some real- world networks. 

We took data from publically available databases, as 
given below. In all cases, we treated the network as undi- 
rected, unweighted network. This in some cases not a 
good model but to examine all the networks in exactly 
the same way, we have chosen to do so. In all cases, where 
the networks consisted of more than one component, we 
only used the largest component of the network, since 
especially the random-walk betweenness is not defined 
properly on a network having more than one component. 
Additionally we eliminated all self-loops (edges connect- 
ing the same node) and parallel edges in the real-world 
networks, if present. 

We have studied the following five networks. 

• Protein-protein interaction in Yeast (PIN) 

The data was obtained from the COSIN database 
[l9l |. In the PIN network each node represents a 
certain protein and an edge is placed between them 
if there has been an observation of an interaction 
between the two proteins in one of various experi- 
ments. 

• Metabolic pathways | 2Cl| of the E. Coli bacte- 
ria (ECOLI) The ECOLI network was obtained 
by using the API of the KEGG ^ database plus 
using the file '"reaction. 1st'" from the KEGG LIG- 
AND database. The latter is needed to separate 
the educts and the products of a reaction, since the 
API only outputs which compounds are involved 
in the reaction. All compounds that are catalyzed 
in any way by enzymes of the E. Coli are used as 
nodes and an edge is placed between two nodes if 
there exists a reaction which has one compound on 
one side of the reaction and the other compound 
on the other side. 

• Collaboration network of people working in 
computational geometry (GEOM) In the 

GEOM network obtained from Ref. 22^ each node 
represents an author from the Computational Ge- 
ometry Database with an edge between two authors 
if they wrote an article together. 

• Network of autonomous systems (AS) The 

AS network is a computer network extracted via 
trace routes from the Internet containing routers as 
nodes and real-world connections between them as 
edges (in fact virtual connections since the router's 
known hosts table determines which nodes can be 



reached from a given point in the network). The 
data for AS was obtained also from the COSIN 
database [Tgj. 

• Network of actors collaboration (AC- 
TORS) 

The data was obtained from the Internet Movie 
Data Base Nodes represent actors. Since the 
database is very huge, we restricted our study to 
films from the UK after 2002. Nodes are connected 
by an edge, if the corresponding actors appear in 
the same film. 

Unfortunately, the ACTORS network did not yield 
meaningful results because the underlying data was 
quite "noisy" : In movies with a lot of actors listed 
in the data base, even the less important parts get 
a high connectivity. Thus, we observed for all mea- 
sures given above a large scatter of the data points 
and very small correlations between them. Fur- 
thermore, we doubt that defining a network of ac- 
tors in this way is meaningful, because usually it is 
not the actors who decide with whom they interact 
in a film, but the producers who select the actors. 
Therefore we do not show here any plots for this 
network type. 

Note that all networks created in this way are of size 
less than 10.000 nodes, which allows to compute the mea- 
sures defined in Sec. HTl easilv. 



V. RESULTS 

For all random models we have used a graph size of 
n = 2000 nodes and drew 100 representatives from the 
ensemble of possible networks. After calculating the four 
different measures for each network, we averaged over 
all representatives to get smooth distributions for each 
measure and network-type. For the real networks we just 
calculated all measures for each given network, clearly no 
average can be performed here. 

Since we consider four types of measures we can calcu- 
late 6 types of measurel-measure2 correlation plots for 
each graph model and each real-world graph. Since we 
have studied the three different graph ensembles for sev- 
eral values of the parameters, e.g. for the edge probabil- 
ity p, this is totalling in several hundred possible plots. 
Many of these plots show strong correlations between the 
two quantities considered and give no qualitative infor- 
mation beyond that. Hence, we restrict ourselves here to 
the most interesting cases, which keeps also the length of 
the paper reasonable. 

All Erdos-Renyi networks, where we performed the 
analysis always for the largest component, show high al- 
most linear correlations between any two measures (not 
shown) for all probabilities p G 0.05,0.10,0.15 we have 
investigated. The data points of any measurel-measure2 
correlation plot lie on the data points of the averaged 
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FIG. 1: Correlation of betweenness and RDW betweenness 
for 100 ER-networks with p = 0.10 and n = 2000. 



ensemble. This seems to indicate that indeed different 
measures are equivalent to each other and that, in or- 
der to characterize how important different nodes are, it 
might be sufficient to look at the degree, which is a lo- 
cal quantity and simple to calculate. Note that the ER 
model is the most simple model considered here, and be- 
low we will find examples, in particular for real networks, 
which exhibit a much more complex behavior. Neverthe- 
less, even the ER networks show a behavior in one case, 
which appears to be very strange. We observe some sort 
of clustering in the the correlation-plot of betweenness 
against RDW betweenness as can be seen in the scatter 
plot over all instances in Fig. for the large edge prob- 
ability p — Q.l. ft seems that there are essentially two 
types of nodes belonging to two different correlation func- 
tions. Note that this splitting into two different behaviors 
is more dominant the higher the edge-probability p of the 
generated networks is, i.e. the more likely it is that each 
graph of the ensemble consists only of one component. 
In particular for graphs with small average degrees up to 
50, this behavior is hardly visible. So far, we do not un- 
derstand this kind of symmetry breaking. Since the two 
measures are identical on star-networks and the random 
walk betweenness generally gives higher higher scores for 
nodes that lie slightly off shortest paths in the network, 
such local irregularities might be an explanation for this 
behavior. 

For the Small- World model we have studied values 
/c 8, 16, 24 and p = 0.05, 0.10, 0.15. We observe usually 
moderately high correlations, but lower than for the ER 
model (not shown directly, see below). For the degree- 
RDW betweenness correlation, the data points are not 
uniformly distributed, similar to the betweenness-RDW 
betweenness we have shown for the ER graphs above. 
This can be seen here better when looking at the corre- 
lation using a three-dimensional plot of impulses, rather 
than a scatter plot, see Fig. |21 The '"oscillation"' that 
can be observed is also consistently present in the over- 
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FIG. 2: Correlation of degree and RDW betweenness for 100 
SW-networks with p — 0.05, 8 nearest neighbors and n = 2000 
with corresponding RDW betweenness distribution. 



all distribution of the RDW betweenness (averaged over 
all degrees). Hence, here also two types of vertices 
seem to be present, but the distinction is weaker than 
above. Even for very small probabilities (i.e. p = 0.0001, 
n = 2000, k = 8) the two peaks are visible though they 
are very close together. The gap between the two peaks 
gets larger the higher the rewiring probability p. Here, 
the difference seems to be strongly related to the re- 
wiring of the nodes, because for the case fc = 8, i.e. the 
degree of the corresponding p — network, the two peaks 
in the distributions are most clearly separated. Even 




FIG. 3: A SW-network with ten nodes. The numbers indicate 
the random-walk betweenness of a node. The node with a 
newly gained edge by rewiring obtains a much higher value 
than the others, while the node which lost an edge obtains a 
much lower value. 

for very small network sizes like 20 nodes, it is possi- 
ble to see two different peaks. Consider for example 
the 20 node network shown in Fig. 13 where just one 
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edge has been rewired. The node with the highest value 
for the random-walk betweenness is the one that gained 
an additional edge by re-wiring. It is also visible that 
nodes with the same degree get different values for the 
RDW-Betwcenness, which become smaller with growing 
distance to the most important node. This explains why 
the peaks get smeared out: It is because the nodes that 
get new edges influence those nodes that stay the same 
from a degree point of view. In general even nodes that 
keep their degree constant but gain crosslinks to high 
RDW betweenness nodes get a similarly high RDW be- 
tweenness. So one explanation of the peaks would be that 
the lower peak is a smeared out version of the one- value 
peak before rewiring and the the peak for higher RDW 
betweenness values appears because rewired nodes get a 
much higher RDW betweenness. 
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FIG. 4: Correlation of degree and RDW betweenness for 100 
BA networks with m = 8 and n = 2000. 



The Barabasi-Albert networks, where we have stud- 
ied values m € 8, 16, 24, show again almost linear cor- 
relations for all combinations of measures. Nevertheless, 
the correlations were not as clear as for the Erdo-Renyi 
networks, i.e. we observed a much larger scattering of 
the data, but in the same order of magnitude as for the 
SW graphs. An example can be seen in Fig. 2| Here, 
we did not observe any particular strange correlation for 
any combination of measures, in contrast to the other 
two models. 

Hence, to summarize the study of the correlations for 
the random graphs (results for the distributions, in par- 
ticular exponents in case of power-law behavior, see be- 
low), wc find most of the time a strong correlation be- 
tween all different centrality measures, hence the degree 
is almost sufficient to characterize the importance of a 
node. This statement is certainly not true for many net- 
works based on real- world data, as we will see next. 

The AS network exhibits a positive correlation for all 
combinations of measures. Nevertheless, the results show 
some aspects of the behavior which is strongly different 
from the networks models discussed previously. E.g. a 
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FIG. 5: Correlation of betweenness and RDW betweenness 
for the AS network. 



scatter plot of betweenness against RDW betweenness is 
shown in Fig.[Sl One can see that the scatter of the data 
points appears is always very small. This indicates that 
the fluctuations generated by the local structure around a 
node are always of the same order of magnitude, irrespec- 
tively of the absolute value of a quantity. Furthermore, 
even more interestingly, we observe that almost all data 
points obey the inequality r > b. So far we do not have 
an explanation for this effect, which is not present in the 
data for the network models. 

The calculations on the PIN network presented high 
correlations on all combinations of measures (not shown) 
except the degree-SC correlation plot, see Fig. 6(a) Here 
you can see two " 'branches" ' that contain the data points. 
Thus, there are two types of vertices. For one type. 



the number of closed walks increases exponentially with 
the degree. This is the behavior, we have for complete 
(sub-) graphs (cliques), i.e. where each protein interacts 
with each other member of the (sub-) graph. On the 
other hand, there are proteins, where the participation 
in closed walks does not increase at all with the degree, 
which means that these proteins, although possibly with 
a large number of interacting partners, participate nev- 
ertheless only loosely in the overall interaction network. 
Note that for large degrees, there seem to be even pro- 
teins, which interpolate between the two limiting behav- 
iors. This behavior is quite the different to what you find 
with for example for the BA networks and it is a hint that 
the structure of this network cannot be modeled with BA 
networks although it degree distribution, which we have 
measured as well (see below), is still scale- free. To illus- 
trate this we tried to fit a BA network's degree distri- 
bution as good as possible to the degree distribution of 
the PIN network and found the best fit for m = 2 and 
ko — —1, although the BA networks for these parame- 
ters have generally a smaller maximum degree than the 
PIN network. As you can see in Fig. |6(b) the correlation 
plots look completely different. For different values of m 
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(a) The PIN-network. 



FIG. 7: Correlation of degree and RDW betweenness for the 
GEOM network. 
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(b) A BA network with similar degree-distribution. 
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FIG. 8: Correlation of betweenness and RDW betweenness 
for the GEOM network. 



FIG. 6: Comparison of degree-SC correlation between BA- 
Model and PIN-network 



and ko the scales of the axes change (especially the SC 
yields much higher values in the same order of magnitude 
as the PIN network) but the generally behavior is con- 
sistent. A model for such an interaction network would 
have to take the existence of two types of proteins into 
account, resulting in two different rules for the creation 
of the nodes. In a recent study of the PIN network 
which also uses a few centrality measures it is found that 
high subgraph centrality is a better hint for essential pro- 
teins than for example the degree. Thus it fits nicely to 
our result that the degree and subgraph centrality are 
not strongly correlated in this case. 

For the GEOM network the measurel-measure2 corre- 
lation plots show a quite scattered behaviour, i.e. much 
smaller correlations than seen in the network models, see 
e.g. Fig. [3 Here we also observe the r > b feature in 



the betweenness-RDW betweenness correlation plot, see 
Fig. |S1 but the effect is even stronger in comparison to 
Fig. El Hence, this inequality might be a property seen in 
many networks based on real-world data and it certainly 
deserves a more thorough investigation. 

For the ECOLI network, the correlations of the mea- 
sures range from moderately correlated highly to highly 
correlated, see e.g. Fig. |5| In principle, the plots look 
quite similar to those of the AS network. Here we could 
not observe any particular new properties, hence we do 
not go into further details for this network type. 



Name 


Degree 


Betweenness 


RDW-Betweenness 


AS 


1.54(4) 


1.66(3) 


1.55(3) 


GEOM 


2.34(6) 


1.86(5) 


1.51(4) 


ECOLI 


2.87(9) 


2.18(9) 


3.1(1) 


PIN 


1.65(4) 


1.82(4) 


1.66(3) 



TABLE I: Power-Law Exponents 
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FIG. 9: Correlations of betweenness and RDW betweenness 
on the ECOLI-network 



Finally, we look just at the distribution of the central- 
ity measures for the real- world networks. We find that all 
the real networks show a scale-free behavior, see e.g. Fig. 
[TUl We have fitted power laws P{x) oc x'^ to all data ex- 
cept for the the subgraph centrality, where the data was 
distributed only over a small interval, so a fit would be 
meaningless. The power-law exponents we calculated can 
be found in table This shows that when just looking 
at the distributions of centrality measures, the behavior 
of the real- world network is also found for the BA model. 
Goh et al. also found that for this model the be- 
tweeness distribution follows a power law. Only when 
considering correlations between different measures, one 
realizes that the so-far existing models, although having 
provided much value insight, have to be extended and/ 
or modified, to really capture the behavior found in the 
behavior of proteins, metabolic pathways, humans and 
other systems represented by networks. 
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VI. CONCLUSION 



In this paper we have studied four different local and 
global centrality measures to analyze the behavior of dif- 
ferent model and real- world networks. First, the choice 
which measure is "most suitable" depends on the net- 
work that is used and which kind of information shall be 
extracted by calculating that measure. There does not 
seem to be an overall best measure that is optimal for all 
applications. The shortest-path betweenness might be 
feasible if the network can be assumed to contain global 
knowledge of optimal routes. But even in this case, when 
much traffic is on the network, it is certainly very often 
advisable to use non-shortest paths to reach the destina- 
tion as quick as possible. In cases where participation 
in social sub-groups is of interest the subgraph central- 
ity might be best whereas in situations where each node 
only passes information randomly to its nearest neighbors 
the random-walk betweenness should be the method of 
choice. 

Nevertheless, in order to understand really how a net- 
work is organized, it sees not to be sufficient to study just 
one measure and its distribution. We have seen that for 
all real-world networks considered here, the distributions 
of all measures is indeed well described by power laws. 
But when considering correlations between different cen- 
trality measures we see that the most common random 
network models reflect the truth only partially since the 
scatter plots do look quite different compared to the real 
networks. 

It seems that network models have to be more specif- 
ically for each application. One single mechanism like 
preferential attachment, at least if being used as the only 
mechanism to create the graph, is too simple to explain 
the complex properties of real-world networks. Models 
that incorporate evolution and growth of networks as 
represented for example in might be the key to give 
deeper insight why many networks show a scale-free be- 
havior for one of their properties and still differ from sim- 
pler models like the BA model. Since each application 
will need its specific mechanism to generate a network, 
proposing new models for specific applications is beyond 
the scope of this work. 



FIG. 10: RDW betweenness distribution for the GEOM net- 
work. 
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